home *** CD-ROM | disk | FTP | other *** search
Text File | 1994-05-26 | 7.6 KB | 133 lines | [TEXT/ttxt] |
- TO OPTIMISE, OR NOT TO OPTIMISE, THAT IS THE QUESTION.
-
- (May 13th) When Intel Corp announced the Pentium, one question that
- taxed the company was the extent to which applications needed to be
- recompiled to get the best out of the chip. It wasn't that recompilation
- was needed for the application to run, it was simply that to get the
- last ounce of performance out of the processor, the compiler had to be
- aware of some of the new features that the new chip offered. Intel
- acknowledged that some applications could benefit substantially by being
- optimised for Pentium - on average they could run 30% faster. But surely
- optimising for Pentium meant de-optimising for the i386 and i486
- processor? At this point, the Intel public relations line tends to be
- "Oh No - optimise for Pentium and you will also optimise for the 486".
- At which point the skeptic gets a little skeptical. To be fair, Intel is
- probably right - it is possible to include optimisations for a chip with
- inbuilt pipelining and parallel execution without damaging its speed of
- execution on a simpler processor.
-
- Today the same questions can be asked of the PowerPC family of chips. If
- I optimise my code specfically for the PowerPC 604, how much extra speed
- will I be able to squeeze out of the processor? If I optimise for the
- 601, will I get sub standard performance if the code runs on a 603 or a
- 604, or vice versa?
-
- A number of compiler writers will point out that at this stage the
- question is a little premature. "Here we are with a familly of chips
- that goes blindingly fast and you want to worry about squeezing out the
- last drop of performance", they cry, quite rightly. However, we'll ask
- it anyway, since it is a question that will become important to the
- software community in the next year or so.
-
- As with so many questions, the answer is "it depends", according to Mike
- Phillip, who manages Motorola's RISC compiler tools group. It depends on
- the kind of application, on the design of the target computer system and
- on the compiler. Given all those imponderables, however, he says, full
- optimisation for the 604 might result in a 10% increase in performance.
- The question then becomes, how much will optimising for one processor,
- slug performance on the others. Again, says Phillip, there is no simple
- answer, but generally compiling for a chip with a large degree of
- internal parallelism (such as the 604) will not impact a more lowly
- processor too badly.
-
- One way to get a good approximation of the truth would be to run
- optimised SPECmark code on one processor and then on the other.
- Unfortunately, no-one has done this yet, partly because all those
- SPECint and SPECfp figures that you have seen for the 604 seem to have
- been generated by simulation, rather than by compiling actual code. It
- is generally accepted that there will be a performance hit of some kind,
- but nobody has figures. PowerPC News is keen to hear from anyone who has
- run comparative tests.
-
- The degree of parallelism is just one way in which the different members
- of the PowerPC family differ. Apart from the obvious multiple integer
- units, the 604 also has a re-engineered floating-point unit. As we said
- in the last issue, the PPC604 is a single pass double-precision unit,
- meaning that both single- and double-precision operations zip through
- the chip in one pass with a latency of 3 cycles. Essentially, it has two
- multiply units for double precision operands. The 601 and the 603, by
- contrast, require double precision to travel the pipeline twice, giving
- a 4-cycle latency and 2-cycle throughput. Coupled with the fact that the
- 604 now has two "wait station" queueing places in front of the floating
- point unit then a compiler-writer has one or two things to continue when
- getting the best out of the processor.
-
- Nearly as important for scientific applications is the organisation of
- the cache on the target machine. Different processors have their cache
- aranged in different ways. The 604, for example, has separate data and
- instruction caches on-board, while the 601 unifies them, but that will
- not make too much difference to the compiler, Phillip says. More
- important is the size and organisation of the of the Level 2 off-chip
- cache. Since commercial application developers will have no idea what
- kind of cache their customers' machines will have, caching will most
- probably be ignored. In any case, says Phillip, it will be the
- scientific, memory-bound applications that will notice the difference.
-
- Last but not least, there are the differences in instruction set between
- the various processors. The most trivial example of this are the extra
- instructions that were retained from the old POWER architecture in the
- PowerPC 601. For more details, have a look at the History of the PowerPC
- part 1 (select 3000). However, the fact is the 601 includes many
- instructions which are not strictly part of the PowerPC set, but are
- there to provide a bridge between the two architectures. The advantage
- is that software from an old RS/6000 will usually run fine, without a
- recompile on the the 601-based RS/6000s. Theoretically, a piece of code
- aggressively optimised for the 601 might take some of these "bridging"
- instructions, causing all kinds of nasties when it runs on the 604 or
- 603, as the processor could trap the illegal instructions.
-
- This admittedly, is a daft example, though there is probably someone out
- there who is trying to make a 601 application go faster using these
- methods. It is worth noting, however, that both the 603 and 604
- processors also have additional instructions in them, which are not
- included in the 601. These are the so-called "graphics instructions":
-
- stfiwx - Store floating-point as Integer Word
- fres - Floating Reciprocal Estimate Single
- frsqrte - Floating Reciprocal Square Root Estimate
- fsel - Floating point Select
-
- Examples of how these instructions can be used in graphical applications
- are given in Appendix E of the PowerPC documentation and they are
- basically quick 'n dirty ways of carrying out floating point operations
- where speed is more important than accuracy. Useful as they are, they
- are not supported by the 601 processor and will generate an error. So
- can they be safely used at all?
-
- Currently, the wisdom from the IBM and Motorola compiler divisions is
- that these kind of instructions will not be generated by everyday
- compilers, instead they will appear in specialised graphics libraries.
- Philip suggests that people may adopt a method of using dynamically
- linkable libraries, which can be swapped in at run-time depending on the
- processor in use. If they are restricted to specialist graphical library
- use, there should not be too many problems. However, other informed
- IBMers suggest that these instructions will become more pervasive over
- time as the focus switches to floating point usage in general
- applications.
-
- So what is the answer? Motorola's Phillip suggests one solution may be
- smart installers. In Motorola's compilers, like many others, switches
- abound. Developers can compile for particular targets, and for
- particular chips. They also allow for the production of multiple sets of
- object code. So, an installation CD-ROM could hold multiple copies of
- (parts of?) the application, together with a smart installer that could
- read the system configuration, processor-type etc, and copy the most
- suitable code across. It is an evolution of the fat binary idea. Will it
- catch on? We don't know because until definitive SPECmarks or other
- benchmarks are run, it is not certain whether there is even an issue.
- Watch out for the launch of the first 604-based Macintoshes, however, as
- that is when we will see the subject apear again.
-
- (c)PowerPC News - Free by mailing: add@power.globalnews.com
-
-